# set the root to the project directory, not the rmd directory
knitr::opts_knit$set(root.dir = rprojroot::find_rstudio_root_file())
# default to not printing the code
knitr::opts_chunk$set(echo = FALSE)
# libraries
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.1.2 v dplyr 1.0.6
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.3
## Warning: package 'tibble' was built under R version 4.0.5
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'readr' was built under R version 4.0.3
## Warning: package 'dplyr' was built under R version 4.0.5
## Warning: package 'forcats' was built under R version 4.0.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(sf)
## Warning: package 'sf' was built under R version 4.0.5
## Linking to GEOS 3.9.1, GDAL 3.2.1, PROJ 7.2.1; sf_use_s2() is TRUE
library(stars)
## Warning: package 'stars' was built under R version 4.0.5
## Loading required package: abind
## Registered S3 methods overwritten by 'stars':
## method from
## st_bbox.SpatRaster sf
## st_crs.SpatRaster sf
library(tmap)
## Warning: package 'tmap' was built under R version 4.0.5
library(transformr) # not needed for this doc at present?
## Warning: package 'transformr' was built under R version 4.0.5
##
## Attaching package: 'transformr'
## The following object is masked from 'package:sf':
##
## st_normalize
library(gganimate) # not needed here at present
## Warning: package 'gganimate' was built under R version 4.0.5
library(viridis)
## Warning: package 'viridis' was built under R version 4.0.5
## Loading required package: viridisLite
## Warning: package 'viridisLite' was built under R version 4.0.5
library(colorspace)
## Warning: package 'colorspace' was built under R version 4.0.5
library(patchwork)
## Warning: package 'patchwork' was built under R version 4.0.5
## sysname release version nodename
## "Windows" "10 x64" "build 19042" "DESKTOP-NU5UDHD"
## machine login user effective_user
## "x86-64" "Galen" "Galen" "Galen"
## Warning: package 'here' was built under R version 4.0.5
As with the local, I’m going to develop up the plots here so I can see what I’ve done and what each bit of code does. The tradeoff is that this is a bit clunkier to code in, and can take a long time to load.
I’m basing all of this on the annual data, I think, though the bimonthly is read in if we want to do that.
Some of the scaling done by the plot functions should probably be cleaned up a bit, especially for the summed values- the breaks and whatnot are fine, but the units should probably be adjusted by several orders of magnitude.
The values are also not to be trusted- this is a capacity demonstration, and things like the yearly sum of bimonthly ER estimated just for the maximum extent is not an informative unit, and the values do not say much about what’s actually going on in the system with metabolism. I really think we need to come up with better units to make the point that this is a capacity demo, and accentuates one of the meta-benefits of the model of identifying where and why the output managers really want can’t be done and what needs to be done to get there.
I think I’ll start out with tmap and ggplot again, but probably switch to just ggplot quickly since tmap and rmarkdown aren’t friends
Now that I made the functions, the local code should effectively just work.
## tmap mode set to plotting
## Warning: Values have found that are less than the lowest break
And a test- can I output to the Rstudio viewer? Looks like no. I’m trying this to try to be able to use tmap_view in here, since it doens’t play nicely with the notebook
## Warning: Values have found that are less than the lowest break
The ggplot versions
I’m going to pull the titles off everything but temp, assuming they’re going to be smooshed together and get titled then, but leaving a useful prefix in the call in case I change my mind.
So, since tmap isn’t adding anything useful for the static plots, I’m going to just do ggplot from now on.
Forcing the min and max makes the breaks more reasonable, but does reduce resolution a bit. Not really sure what the best solution is there. It also lets datsets with different ranges match up better instead of ending up with slightly different splits
THESE VALUES ARE A MESS AND SO UNITS ARE STUPID Maybe just tonnes O2 assuming max extents? Or ??? just lots/little somehow. IE make clear these are NOT real numbers?? IE ‘Aggregated GPP’ and then not even have units?
Make those a grid a la the local version
The local worked better with a bottom legend, this seems to do better with a top. Really, might wan to shift to scientific notation or choose even bigger units.
Annoying that temp doesn’t get as smooshed by its legend and so is a different size.I keep hearing good things about patchwork?
That IS nicer. Print that out
## png
## 2
## png
## 2
The regression above discards a valley term in order to predict across the basin. That comes at the cost of more error. We can’t really say what that error is at this scale, but we CAN say we want to be more confident and cut things down.
One way to show that is simply to ignore those catchments where we can’t predict, putting an asterisk on them, effectively.
Need to use the forcemin and max here and above (or, I suppose could force to the max of the above) to end up with the same breaks.The nice clean factor of 10 breaks is nice though.
We could package that up as a two-panel,
But it might be better to directly compare? either for both
Or a single outcome to keep figure sizes down
ER is actually a better example because they’re more clearly different.
Let’s print that out as an option
## png
## 2
## png
## 2
But what I think probably makes the most sense is to add it to the input-output figure and do it all at once?
Trying better legends
That’s OK, but really weirdly aligned. Maybe with some more tweaks to the way it’s specified?
I guess print that out so we have something,
## png
## 2
## png
## 2
That actually looks pretty OK printed. It’s a bit offset, but that’s fine, the columns don’t mean anything relative to the top row.
But maybe….
That lines up OK, but the legends are back to being a mess.
Rebecca had the idea of sort of combining the uncertainty panels to use less saturated colours for the un-parameterised valleys. That should be possible, but would take some playing to get the plotting to work. I’m going to focus for now on getting the rest of what we need done and come back to that, maybe post-report but pre-paper.
Can I just show a few years back to back in panels?
And can I do that with the functions? Will probably need a thing to accept more than one datewanted
Not sure we really care if we do it for the inputs, gpp etc?
I think this is overkill, but make a panel version It is not clear to me why the temps don’t get 4 rows?
That is overkill, I think.
Let’s print each of the variables and the big overkill plot
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
And the big smash
## png
## 2
## png
## 2
Maybe that actually IS the way to go. Do the whole smash of the inputs and outputs and yearly aggregation using this. Then do uncertainty separately.
Inputs too? maybe in the timeseries? maybe this is the scenario comparison too?
Make the input/output plots just as above for one year
Set up consistent labels
Print that out
## png
## 2
## png
## 2
Do these for the 5-year, then it’s not just an arbitrary year Going to take a bit of experimentation.
Which way do I want it to go? IE what do I want on the chart? Could have bars for variables within catchments? Facetted variables? only one variable?
First, how do we make a bar chart from the stars? I might be able to do it directly from the stars, but I’m sure ggplot will like dealing with a df better. And we don’t need geometric info for these plots.
First, prep the df. Probably move this to the dataprep script.
BUT, we DO need to get the catchment names
Some testing of that. st_join is the way to go, but st_intersects yields 111 rows, st_equals yields NAs for the second argument’s values. st_nearest_feature seems to work, but assigns multiple catchments to the same name. st_equals_exact seems to work with a par argument, but need to test. The comboplot doesn’t fit here in a readable way, but can be checked in the main R console, but flipping back and forth between teh singles is enough to see they match.
## [1] "Avoca" "Barwon Darling" "Border Rivers"
## [4] "Broken" "Campaspe" "Castlereagh"
## [7] "Central Murray" "Condamine Balonne" "Edward Wakool"
## [10] "Goulburn" "Gwydir" "Kiewa"
## [13] "Lachlan" "Loddon" "Lower Darling"
## [16] "Lower Murray" "Macquarie" "Mitta Mitta"
## [19] "Murrumbidgee" "Namoi" "Ovens"
## [22] "Paroo" "Upper Murray" "Warrego"
## [25] "Wimmera"
Now, on to actually constructing the things I need- join to get the names. I could st_drop_geometry, but not sure it’s needed?
I’ve gone and made a bunch of those in the Setup_basin file. Now to make some plots
Option 1: Each variable separately, maybe in a grid again.
Get the setups so I can match colors. Call by name here, because the column location moves in funny ways duruing the pivots
Ordering by name is pointless. I’ll order by centroid latitude, for lack of anything obviously better. Position from mouth might be better, but that gets hard to calculate.
Temp
Inundation Here we really see that position from mouth would be good; the lower Murray’s centroid is actually above a lot of the others in Vic.
It’s also clear that the color scale could use some better breaks, but it’s unclear what those would be to keep consistent. This can all probably be turned into functions also, but for now let’s sort out what they look like and then shift this over before we do the scenarios (and if we decide these do what we want).
NEED to sort out the y-breaks and labels if we change the limits- the y-scale needs to start at 1
Huh. why did I set those limits?
Lower Murray ER does go above 100,000,000, but there’s still gotta be a better set of breaks.
Make a multipanel. Legends blow it up, even when collected. Need to think about that.
If I delete the labs, does it help?
I could do the upside-down histogram a la local. Or something. I don’t really think plotting multiple values per catchment is a good idea, but I’ll give it a quick try before moving on.
First need to glue the data together. Annoyingly complicated
## Joining, by = c("ValleyName", "latpos")
## Joining, by = c("ValleyName", "latpos")
## Joining, by = c("ValleyName", "latpos")
Ignore temp because it’s a different scale. rather than trying to have varying colors, let’s use y do do that, and just color code by variable type. Too confusing otherwise. I’ve just chosen some ok-looking values from the scales above.
Print out a lot of those bars and move on
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
For the timeseries, I might as well use the bimonthly, rather than yearly? Maybe?.
again, let’s figure out how to make this here, then do it for everything in the data code.
## Warning: Removed 25 rows containing missing values (geom_point).
## Warning: Removed 25 row(s) containing missing values (geom_path).
That’s pretty cool, but the seasonality is going to make it hard to see what’s happening
Does an annual version look better? Yes. tried to get fancy with labels, needs work.
That annual version probably is better. Depends on what we’re trying to show, I guess.
Push those all over to the data script and run for everything.
bimonhtly timeseries of everything. not going to worry too much about labelling
## Warning: Removed 25 rows containing missing values (geom_point).
## Warning: Removed 25 row(s) containing missing values (geom_path).
and a combo. labels are terrible but needed. Interesting that these are NOT obviously seasonal
## Warning: Removed 25 rows containing missing values (geom_point).
## Warning: Removed 25 row(s) containing missing values (geom_path).
and a combo. labels are terrible but needed.
One option- the 5-year with a marginal histogram
what does that look like printed?
## png
## 2
## png
## 2
That actually looks pretty good.
What about a timeseries? either under that or alone?
Labelling the catchments is going to be terrible though. But I just need to make it
As a demo, let’s just do one of the timeseries and strip the legend off.
That’s ugly and not very informative. What if I just give whole-basin sums for each year? IE have the bars for catchments, and a line for time for just the three variabels?
Need to glue the data again, then sum over years (or vice versa)
## Joining, by = c("date", "ValleyName", "latpos")
## Joining, by = c("date", "ValleyName", "latpos")
## Joining, by = c("date", "ValleyName", "latpos")
## `summarise()` has grouped output by 'date'. You can override using the `.groups` argument.
## `summarise()` has grouped output by 'date'. You can override using the `.groups` argument.
Breaks need work to cleannup
what does that look like printed?
## png
## 2
## png
## 2
what does that look like printed?
## png
## 2
## png
## 2
Now can we combine with timeseries? Still not sure what to do with the legend. We need one (and a better color palette), but there are too many catchments to make it very clean. Need to sort something out there.z
## Warning: Removed 25 rows containing missing values (geom_point).
## Warning: Removed 25 row(s) containing missing values (geom_path).
## Warning: Removed 25 rows containing missing values (geom_point).
## Warning: Removed 25 row(s) containing missing values (geom_path).
## png
## 2
## Warning: Removed 25 rows containing missing values (geom_point).
## Warning: Removed 25 row(s) containing missing values (geom_path).
## png
## 2
This will likely have some good options using column plots to look at the differences. I have no idea what I meant by the above sentence.
TRYING NEW LIMITS- CAREFUL changed forcemin = 0 to 10000 based on the bar chart. will need different limits for different aggregation levels, but that loosk like it’d work for the 5-year. tried turning forcemax down, but that was terrible
The obvious thing to do is to build a 2x2 matrix- maybe with the inundation and temps as marginal explanations???
Let’s do it with the 5-years, since then I don’t have to choose a particular year. It would all work the same with single years (or bimonths or whatever).
The scale needs work. Ugh.
Same, but with ER
Even lamer scale. Really need to choose something else. what if I don’t force? They’ll end up different, probably, but maybe will help choose.
Can I put the temp and inundations on there too?
Yeah, the colors for the logged stuff needs to start a LOT higher when we’re summing over catchments. Bar charts still need to start at 0 on Y though.
Will go back and sort that out, first, can I build the fig with the inpus? Took the legend off anyway, will need to add back on somehow.
## png
## 2
## png
## 2
Same, with ER
## png
## 2
## png
## 2
Huh. that’s better but still can’t see any changes with inundation. I wonder if I can force more levels in the plot functions?
I think I need to make some marginal bars here too.
it’s tempting to try to jam the bars into the factorial panels with color for ER and GPP, but i think it actually makes more sense to have them for one outcome wit bar color for the situation (baseline, warming, inundation, both)
Let’s try to make that for GPP. Will first need to build a combo dataset and add names. I have the dfs already, right? Is this as easy as binding rows after appending a type col?
Now, make a bar plot. not a huge fan that the qualitative palettes all end up looking like the colours I’m using for GPP and ER, but for now wil have to do.
Make a combo
Needs a bit of cleanup, but it’s OK. Not sure why when I shift the basesize stuff on the new plot it gets wonky.
## png
## 2
## png
## 2
Same, ER
Make a combo
Needs a bit of cleanup, but it’s OK. Not sure why when I shift the basesize stuff on the new plot it gets wonky.
## png
## 2
## png
## 2
TODO: GO through and change other forcemins, but check they make sense for other datasets
figure out more levels
What do I want to show here? for a single date, the bars and PI for the catchments with either completely correlated or uncorrelated errors.
The idea here is to make somethign for the box- say these are wrong, but they show the endmembers. With the bar being both the point estimates and the situation where the errors are uncorrelated (because there are so many ANAEs in each catchment, they end up law-of-large-numbering to the mean), and the bars are the situation where errors are perfectly correlated.
First, plot the bars- following what I did above for the 5-year but for a single bimonhtly unit (I think we can make the point wiht one snapshot just fine, and that isolates the explanation to a spatial issue. We should avoid getting into the issue of then how to do it over time, since aggregating bimonthly maxes is a bit nonsensical).
gppdf is the bimonthly, but I need to get something similar for the upper and lower and glue together. Do that in the data script.
How is that so tight? The ranges are huge, since logged. But is it because only some of the wetlands are ever wet, so even if they have a wide range, the overall fluctuation relative to the mean isn’t super huge? Maybe? seems like they should be roughly equivalent to those in the Werai in terms of bar height to error ratio?
Plots
Glue together into a combo
Does that actually look better as matched bars?
Should I just toss inundation on there too? like in comboplot? Maybe this should just replace comboplot?
Could make one with the scenarios as well?
Print those
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2
## png
## 2